Module 3 Lecture - Transformations and Non-parametric Comparisons for Two Groups

Analysis of Variance

Quinton Quagliano, M.S., C.S.P

Department of Educational Psychology

1 Overview and Introduction

Agenda

1 Overview and Introduction

2 Solutions for Assumption Violations

3 Correct for Violations in Data

4 Use Non-parametric Tests

5 Conclusion

1.1 Objectives

  • In module 2, we discussed the intricacies of avoiding Type I and II errors in hypothesis testing, the connection to statistical power, and how assumption violations can create problems in Type II errors

  • In this module, students should be able to:

    • Understand the different possible options to pursue in light of assumption violations
    • Appreciate the use of variable transformations in addressing problems in skew, and using trimming/winsorizing to address kurtosis, i.e., solving problems in normality
    • Understand non-parametric tests as viable alternative to our usual parametric tests, with certain nuances to be aware of in application.

1.2 Introduction

  • Previously, we talked about identifying problems in 3 types of common assumptions:
    • Normality
    • Homogeneity of variances
    • Independence of sampling variables
  • Now we will go through the Solutions for Assumption Violations, and how they may (or may not) solve our problems
    • Our data is rarely ideal! We might need to readily consider these options
  • Discuss: Of the top of your head, try to remember the different methods of identifying the above assumption violations

2 Solutions for Assumption Violations

Agenda

1 Overview and Introduction

2 Solutions for Assumption Violations

3 Correct for Violations in Data

4 Use Non-parametric Tests

5 Conclusion

2.1 3 General Strategies

  • Important: There is considerable controversy and conflicting ideas in this area, as to 'what should we do' - take non-parametric class for more fun!
  • We need a way to address assumption violations when they occur (well mostly - see Do Nothing!)
    • As mentioned before, the main risk of assumption violations, is that they reduce power and raise Type II error rate
    • Since we are coming into most studies with a hypothesis of differences existing, we want to plan an analysis that has the sufficient power to detect that hypothesized difference
  • We have 3 strategies for approaching assumption violations:
  • Before we go through those, lets talk about some reasonable advice for working through these:

2.2 Advice in ‘Fixing’ Assumption Violations

  • First, always report checks/tests for assumptions of the test you are using
    • E.g., normality tests like Kolmogorov-Smirnov and Shapiro-Wilk; histograms to examine skew and kurtosis; Levene’s test for homogeneity of variances, etc.
    • Review notes from module 2 for more details on each of those
  • Second, you should report assumption checks both before AND after you attempt a fix
    • I.e., If you transform/trim a variable, you should check assumption both before AND after the transformation/trim
    • Re-check ALL assumptions, not just the ones you attempted to fix with the transformation or trimming
  • Third, be honest and transparent when it seems like a solution does not work or when the downsides outweigh the results
    • We’ll talk about different pitfalls when using each strategy
    • It’s good practice to always be straightforward about limitations of our analyses
  • Important: The emphasis is always on transparently reporting results!

2.3 Do Nothing!

  • Without going into too much detail, we are concerned with how robust a test is, or how resilient a test is to assumption violations, and how well it works under less-than-ideal circumstances

  • Some researcher’s hold that many commonly used tests, i.e., t-tests are reasonably robust at baseline to assumption violations

    • This becomes even more true at large \(n\) size (See discussion in [Sample Size] section])
    • However, when samples are small or the groups being compared are unequal - this route cannot be recommended
  • Discuss: How often do you think this option is taken in 'real' research?

3 Correct for Violations in Data

Agenda

1 Overview and Introduction

2 Solutions for Assumption Violations

3 Correct for Violations in Data

4 Use Non-parametric Tests

5 Conclusion

  • There are several options for selectively transforming and or trimming our data to correct for certain patterns of skewness, kurtosis, or outliers
    • However, be aware these are not panaceas - they come with their own issues

3.1 Transformations / Dealing with Skewness

  • Making mathematical variable transformations is largely used to address the [Normality Assumption], but maybe indirectly solve other issues as well

  • The exact transformation is dependent on the type of problem, specifically the skew:

    • Positive/right skew: Logarithmic (Severe) or Square Root (Moderate)
      • Logarithmic method: \(\log_{10}(X)\)
      • Square root method: \(\sqrt{X}\)
      • \(X\) is the variable of interest
    • Negative/left skew: Reflect and Logarithmic (Severe) or Reflect and Square Root (Moderate)
      • Logarithmic method: \(\log_{10}(K - X)\)
      • Square root method: \(\sqrt{K - X}\)
      • \(K\) is a constant from which each score is subtracted, so that the smallest score is equal to 1
  • Advantages:

    • Makes use of all available data
    • Allows for use of a traditional well-known technique.
  • Disadvantages:

    • Interpretability can be questionable
    • Fixing one assumption violation can create others
  • Question: I have a variable of numeric test scores, in which a histogram shows that there is severe bunching of scores to the left, with a tail off to the right. Which of the above transformation might be the best to try first?
    • A) Log
    • B) Square Root
    • C) Reflect and Log
    • D) Reflect and Square Root

3.2 Trimming and Winsorizing / Dealing with Kurtosis

  • Another option, particularly useful for negatively kurtotic (platykurtic) distributions (relatively flat distributions with an unusual number of observations in the tails) is to use variable trimming.

  • A trimmed sample is a sample where a fixed percentage of extreme values is removed from each tail.

    • Of course, if you are comparing groups, you would want to trim the same percentage from the tails of both groups to be fair.
    • 20 percent or 0.10 from each tail is the most common amount to trim
    • Example Mean: \(((6.0 + 8.1 + 8.3 + 9.1 + 9.9) / 5) = 8.28\)
    • Example 20% Trimmed Mean: \((8.1 + 8.3 + 9.1) / 3 = 8.50\)
  • Another related option is using winsorizing

    • A Winsorized sample replaces the trimmed values by the most extreme value remaining in each tail.
    • Example Dataset: 2, 4, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 62, 75
    • Example 20% Winsorized Dataset: 7, 7, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 61, 61
    • Note: degrees of freedom (df) for a test on a trimmed or Winsorized sample must be adjusted for the data trimming. For both, you would subtract your total N by the number of trimmed cases to get a new value for N. (We do this even for Winsorized samples b/c the added values are really pseudovalues.)
  • Advantages:

    • Allows for use of a traditional well-known technique.
    • Interpretability of variable remains intact
  • Disadvantages:

    • Loss of information - what if those outliers were an important part of the phenomenon
  • Discuss: Using the variable data 1, 2, 3, 4, 5, try 20 percent trimming and winsorizing this data

4 Use Non-parametric Tests

Agenda

1 Overview and Introduction

2 Solutions for Assumption Violations

3 Correct for Violations in Data

4 Use Non-parametric Tests

5 Conclusion

4.1 Introduction

  • Non-parametric test are distribution -free statistical tests that are not based on parameters (i.e., means or standard deviations) or assumptions about the normality underlying data distribution.

  • They do still have some assumptions, just not about normality! More on that in Assumptions of Non-parametric Tests

  • Instead, non-parametric tests are based on amounts such as percentages (Chi-square) or ranks, i.e., ordinal or ordinal -transformed data

  • Non-parametric tests are just as powerful as traditional tests, and under situations of violated assumptions can be much more powerful

  • We’ll discuss the Wilcox Rank Sum Test and the Mann Whitney U Test as they are non- parametric tests comparing the rank orderings of two independent samples

    • These can be thought of as the non-parametric versions of the independent samples t-test
  • We’ll also cover the Wilcox Matched Pairs Signed Rank Test, sort of an analog to the dependent-samples t-test

  • Because these test use ranks, they can actually use ordinal data, unlike the t-test!

  • Discuss: Think of a hypothesis/scenario in which you'd use the independent samples t-test

4.2 Wilcox Rank Sum Test

  • The Wilcox Rank Sum Test is based on the logic that if there truly is a significant difference between two groups, the ranks from one group should generally be lower than the ranks from the other group.

  • Following from that, if the groups are different, the sum of one group should be lower than the sum of the ranks from the other group.

    • If the sum of the one group is too small relative to the other sum we will reject the null hypothesis.

Calculating Test Statistic

  • First, combine both groups and rank their individual values from smallest (starting at 1) to largest
    • In the scenario of ties, you can assign them all the mean rank or assign adjacent ranks at random
Group A Scores (\(n_1 = 4\)) Group B Scores (\(n_2 = 5\))
85 70
92 82
88 75
95 78
80
Score Group Rank
70 B 1
75 B 2
78 B 3
80 B 4
82 B 5
85 A 6
88 A 7
92 A 8
95 A 9
  • Second, calculate the sum of the ranks of the smaller group, also called \(W_s\).
    • The “smaller” group is the one with a smaller \(n\)
Group A Ranks Group B Ranks
6 1
7 2
8 3
9 4
5
Sum (\(W_A\)) = 30 Sum (\(W_B\)) = 15
  • Third, compare \(W_s\) against critical value table, which is derived from sample sizes from the smaller and larger groups
    • Realistically, we’ll use SPSS for this
  • Discuss: Try calculating the rank sums for groups of data: A: 23, 34, 89; B: 16, 12, 40, 50; which of these is $W_s$?
  • Important: Contrary to what we are used to, we want our test statistic to be *less* than the critical value
  • This is a one -tailed test that tests to see if the ranks of the smaller group are sufficiently smaller than the larger group.

When Smaller Group Has Larger Ranks

  • A problem we may run into in the above process is if the smaller group (by \(n\)) actually has the larger ranks
    • To solve for this, we can backwards rank the data, i.e., start with the largest value as 1
  • We can check for that possibility by calculating \(W_s'\) or the compliment of \(W_s\)
    • \(W_s' = 2\bar{W} - W_s\) where
  • Where:
    • \(2\bar{W} = n_1(n_1 + n_2 + 1)\)
    • \(n_1\): sample size of smaller sample
    • \(n_2\): sample size of the larger sample
  • Important: So effectively, we have to consider the statistic both ways to account for both directions. This gets covered by our next test.

4.3 Mann Whitney U Test

  • The Mann Whitney U Test is a test completely equivalent to the Wilcoxon Rank Sum, however, it is slightly more complex in order to eliminate the need to compute \(W_s'\).
  • Important: For this reason, it tends to be a bit more immediately useful, because it doesn't require going through the equation twice.
  • The Mann Whitney U, like the Wilcoxon Rank Sum, is based on ranked data.

Calculating Test Statistic

  • We start with the exact same steps as the [Wilcoxon Rank Sum Test], calculating \(W_s\)

  • The we calculate the \(U\) statistic:

\[ U = \frac{n_1(n_1 + 2n_2 + 1)}{2} - W_s \]

  • Where:
    • \(n_1\): sample size of smaller sample
    • \(n_2\): sample size of the larger sample
  • Important: SPSS will only calculate Mann Whitney U, but because these tests are equivalent, that is fine!

4.4 Wilcox Matched Pairs Signed Rank Test

  • The Wilcox Matched Pairs Signed Rank Test is a non- parametric test testing the null hypothesis that two related samples were drawn from identical populations with the same mean.
    • This is a non-parametric version of the dependent/paired samples t-test.
  • The logic of the Signed Ranks Test rests on measuring the direction and magnitude of change.

Calculating Test Statistic

  • Compute difference scores (\(d\)) between time 1 \(t_1\) and time 2 (\(t_2\))
Participant Before After Difference (\(d_i\))
1 120 115 -5
2 135 130 -5
3 110 120 +10
4 145 145 0
5 130 110 -20
6 125 122 -3
  • Next, rank regardless of sign
\(|d_i|\) Absolute Rank Sign
3 1 Negative
5 2.5 Negative
5 2.5 Negative
10 4 Positive
20 5 Negative
  • There are two kinds of ties possible this time:
    • Like before, you can have rank ties which you would handle by either random ranks or tied ranks.
    • Now, though, you have the possibility for paired values to be equal (difference = 0). When this happens do not rank that case, and drop it from your sample.
  • Next, Sum positive and negative ranks
Absolute Rank Positive Ranks (\(R^+\)) Negative Ranks (\(R^-\))
1 1
2.5 2.5
2.5 2.5
4 4
5 5
Sum (\(W\)) \(W^+ = 4\) \(W^- = 11\)
  • Our test statistic is \(T\), which is the smaller of the two absolute sums

4.5 Assumptions of Non-parametric Tests

  • These mentioned non-parametric tests allow us to ignore normality problems in our data
    • However, we still need to test for and address problems in homogeneity of variances
    • See prior module lecture on examining text and Levene's test to help identify these issues

4.6 Effect Size Under Non-parametric Tests

  • We can use this quick equation and rule-of-thumb for effect size under these tests

\[ r = \frac{z}{\sqrt{n}} \]

  • Where:
    • \(z\) is the standardized test value
    • \(n\) is the sample size
  • Interpretation
    • Small: 0.1
    • Medium: 0.2
    • Large: 0.3
  • Important: Like with many statistics, there are *many* other ways of calculating effect size, some with much more complex equations - but for the scope of this class, we'll use this

4.7 Advantages and Disadvantages of Non-parametric Tests

  • Advantages of Non-Parametrics
    • Interpretability of variables remains intact
    • Makes use of all available data
    • We still have good options for statistical significance testing, effect size, and other normal comforts of parametric tests
  • Disadvantages
    • Non-parametric tests are relatively unknown. Many researchers would not know how to interpret them, and thus, may prefer just to see the parametric version, regardless of issues
  • Discuss: Do you think this is disadvantage is a sufficient reason to avoid these tests? What issues (if any) do you have with this view?

5 Conclusion

Agenda

1 Overview and Introduction

2 Solutions for Assumption Violations

3 Correct for Violations in Data

4 Use Non-parametric Tests

5 Conclusion

5.1 Recap

  • We have now discussed various methods we may pursue when dealing with oddities in our data that result in assumption violations

  • Some of these strategies involve carefully modifying our data with transforming, trimming, or winsorizing; some strategies involve alternative selection of tests; and then finally, we may do nothing (mindfully)

  • Each of our options result in some new things to be careful about, like transformation changing the scale of our data, trimming removing data, or non-parametric tests changing interpretation of results

  • Our discussion of the non-parametric analogs for the t-test will help lead us into a similar discussion on alternative tests to the one-way ANOVA (and others) later in the semester

  • Paramount to all of these things, we must be mindful of transparency when working through assumption problems

5.2 Lecture Check-in

  • Make sure to complete any lecture check-in tasks associated with this lecture!

Module 3 Lecture - Transformations and Non-parametric Comparisons for Two Groups || Analysis of Variance